Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Background Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. Objective We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. Methods We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. Results Among the 108 questions with images, GPT-4V’s accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively. Conclusions The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination.


Introduction
In this document, we showcase actual examples of inputs and outputs from GPT-4V, accompanied by their English translations.The question texts and images are all cited from the 117th Japanese National Medical Licensing Examination, and all translations have been performed by the authors.
Sections where GPT-4V accurately interpreted the images are highlighted in blue.Conversely, sections where its interpretations were incorrect are highlighted in red.

English Translation
Example 1 (Question E7, General) Unfortunately, as a text-based interface, I do not have the capability to display images or waveforms such as "Supplement No. 2." Therefore, I cannot refer to the specific electrocardiogram waveform.
However, it is possible to infer the clinical situation from the provided options.Based on the electrocardiogram waveform of a patient who has fainted and the response to the listed drugs, I consider the following: a Atropine: Used for bradycardia or some types of AV block.b Adrenaline: Used for cardiac arrest or severe hypotension.c Diazepam: Used for seizures or anxiety.d Verapamil: Used for certain types of supraventricular tachycardias.e Lidocaine: Used for bradyarrhythmic or ventricular arrhythmias.
Since the causes of fainting can vary widely, it is not possible to make an accurate choice without seeing the specific electrocardiogram waveform.If you can provide more detailed information about the electrocardiogram waveform or symptoms, I can offer more specific advice.
[Select only one option.]Understood.
Considering the situation of a patient who has fainted, and taking into account emergencies like cardiac arrest or severe hypotension, b Adrenaline would be the most commonly considered option.
However, to provide an accurate answer, details of the electrocardiogram waveform and the patient's clinical condition are necessary.

GPT-4V Response with Image (Correct)
This electrocardiogram waveform may suggest the presence of a third-degree atrioventricular block (complete heart block).In cases of third-degree AV block, where the heartbeat is slow, atropine may be used.
Therefore, the medication that should be administered immediately is: a Atropine However, the interpretation of the electrocardiogram and the choice of treatment should be made by a specialist according to the clinical situation.This information is for educational purposes and should not be used as medical advice.
Example 2 (Question A26, Clinical) Question A 32-year-old woman presented with complaints of fever, dizziness, and general fatigue.She had been experiencing a low-grade fever of around 37°C for the past six months, accompanied by general fatigue and dizziness.Despite taking over-the-counter cold medicine, her symptoms did not improve, prompting her visit.Her past medical and family histories were unremarkable.Blood pressure measurements showed 120/62 mmHg in the right arm and 80/40 mmHg in the left arm.Vascular bruit was audible from the neck to the left supraclavicular fossa.A Levine 2/6 diastolic murmur was heard at the left sternal border at the level of the third rib.No abnormalities were noted in respiratory sounds.Erythrocyte sedimentation rate was 85 mm/hour.Blood test results: Red blood cells 3.6 million, Hb 10.8 g/dL, white blood cells 9,600 (band neutrophils 10%, segmented neutrophils 70%, eosinophils 1%, basophils 1%, monocytes 4%, lymphocytes 14%).No abnormalities were found in the electrocardiogram and chest Xray.A separate three-dimensional contrast-enhanced CT scan is provided (Supplement No. From these images, it seems that images from a three-dimensional contrast-enhanced CT scan are presented.Taking into account the patient's clinical information and the images, we can suggest the most likely diagnosis.
Firstly, the images show abnormalities in the aorta.In particular, there is dilation and deformation in parts of the aorta.
From the patient's clinical information: Vascular bruit from the neck to the left supraclavicular fossa Levine 2/6 diastolic murmur strongest at the left sternal border at the third rib Combining this information, the following diagnoses can be considered: a. Coarctation of the aorta: While these symptoms could be associated with a narrowing of the aorta, the images do not show clear signs of constriction.b.Sarcoidosis: This is primarily a lung-affecting disease, and specific signs of this condition are not indicated in this information.c.Aneurysm of the aortic arch: Considering the abnormal areas in the images and clinical symptoms, this diagnosis is possible.d.Stanford Type A aortic dissection: Signs of aortic dissection are also suggested in the images, but the clinical signs and symptoms do not fully align.e. Takayasu arteritis: This disease involves inflammation of the aorta, but the symptoms and image information alone make a definitive diagnosis challenging.
The most likely diagnosis seems to be c.Aneurysm of the aortic arch, but the actual diagnosis should be based on a professional opinion from a physician.

Question
The electrocardiogram waveform of a patient who has fainted is shown separately (Supplement No. 2).Which of the following should be administered immediately?